Identifying multimorbidity clusters in an unselected population of hospitalised patients

Multimorbidity (multiple coexisting chronic health conditions) is common and increasing worldwide, and makes care challenging for both patients and healthcare systems. To ensure care is patient-centred rather than specialty-centred, it is important to know which conditions commonly occur together and identify the corresponding patient profile. To date, no studies have described multimorbidity clusters within an unselected hospital population. Our aim was to identify and characterise multimorbidity clusters, in a large, unselected hospitalised patient population. Linked inpatient hospital episode data were used to identify adults admitted to hospital in Grampian, Scotland in 2014 who had ≥ 2 of 30 chronic conditions diagnosed in the 5 years prior. Cluster analysis (Gower distance and Partitioning around Medoids) was used to identify groups of patients with similar conditions. Clusters of conditions were defined based on clinical review and assessment of prevalence within patient groups and labelled according to the most prevalent condition. Patient profiles for each group were described by age, sex, admission type, deprivation and urban–rural area of residence. 11,389 of 41,545 hospitalised patients (27%) had ≥ 2 conditions. Ten clusters of conditions were identified: hypertension; asthma; alcohol misuse; chronic kidney disease and diabetes; chronic kidney disease; chronic pain; cancer; chronic heart failure; diabetes; hypothyroidism. Age ranged from 51 (alcohol misuse) to 79 (chronic heart failure). Women were a higher proportion in the chronic pain and hypothyroidism clusters. The proportion of patients from the most deprived quintile of the population ranged from 6% (hypertension) to 14% (alcohol misuse). Identifying clusters of conditions in hospital patients is a first step towards identifying opportunities to target patient-centred care towards people with unmet needs, leading to improved outcomes and increased efficiency. Here we have demonstrated the face validity of cluster analysis as an exploratory method for identifying clusters of conditions in hospitalised patients with multimorbidity.

www.nature.com/scientificreports/ commonly co-occur 3,4 . This will enable us to anticipate the specific health needs of, and implications for, patients with particular conditions in combination. Cluster analysis is a statistical technique that categorises items or properties into groups so that items in the same group are more statistically similar than those items in other groups. Our literature search highlighted that cluster analysis has previously been used to identify clusters of conditions in individuals from the general population, presenting in primary care, within narrow specialty subsets, or focussing specifically on older age groups 5 . We identified few studies that included hospitalised patients; three of which focussed on patients ≥ 65 years [6][7][8] , and one focussed on medical patients 9 . To our knowledge, no previous study has identified which conditions commonly cluster among unselected patients presenting to hospital and yet this is a setting of high strain on health systems globally.
As a first step to understanding the implications of disease clusters, we aimed to identify and characterise multimorbidity clusters in a cohort of patients hospitalised in the Grampian region of Scotland. This builds on our previous work describing the overall extent of hospital multimorbidity 10,11 .

Methods
Study design and setting. This study was prospectively preregistered on the Open Science Framework and is reported as per RECORD guidelines 12 . This is a population-based observational study using linked electronic health records carried out in a secondary care setting in a single health region in north-east Scotland (Grampian region, total population 2014, 584,220 13 ). The region consists of one large urban centre and is spread over approximately 3000 square miles of city, town, village and rural communities 14 . Full details have been previously published 10,11 . Data sources. We used inpatient hospital episode data, namely the Scottish Morbidity Record (SMR) 15 , from general/acute (SMR01) and psychiatric (SMR04) admissions, from the years 2009-2014. SMR is an episode-based patient record relating to all patients discharged from hospital in Scotland. SMR data is collated in a national database, managed by Information Services Division Scotland 16 , and data is returned to each regional health authority on an ongoing basis. Data collected includes patient identifiable and demographic details, episode management details, general clinical information and death data. Clinical information is recorded as main diagnosis and up to five other significant diagnoses and coded using the World Health Organization's International Classification of Diseases (ICD-10).

Study population.
Adult patients (≥ 18 years) admitted to any hospital as an inpatient during 2014, in a single regional health authority (NHS Grampian) were included. A patient's first admission in 2014 was classified as their "index admission", and the admission date was classified as their "index date". We excluded day case, obstetric and psychiatric admissions when identifying the study population. The flow diagram for identifying the study population is shown in Fig. 1. Patients with multimorbidity (≥ 2 conditions) were included in the present analysis (n = 11,389). Multimorbidity measure. Multimorbidity was defined as having recorded diagnoses of ≥ 2 chronic conditions 17,18 . Conditions were identified from general/acute and psychiatric admissions in the 5 years prior to index date. We used the multimorbidity measure developed by Tonelli et al. 19 . This was based on the measure developed by Barnett et al. 20 for measuring multimorbidity in a primary care population, using coding unique to primary care in the UK 21 . Tonelli et al. developed a corresponding validated coding scheme for use with administrative data based on the ICD system 19 . The specific ICD-10 codes for the 30 conditions included are detailed in Additional file 1, with a note of minor amendments made. ICD-10 codes recorded as main or other diagnoses were included.
Other variables. Other baseline characteristics were sex, age, deprivation, urban-rural area, and admission type. Age was categorised into six groups. Deprivation was measured using the Scottish Index of Multiple Deprivation (SIMD) 2012, categorised as quintiles (quintile 1 is the most deprived and quintile 5 the least deprived) 22 . Urban-rural status was measured using the Scottish Government sixfold Urban Rural Classification 2009/10 23 . SIMD and Urban Rural classification are identified from postcodes using the Scottish Government's publicly available look-up files 24,25 . Data linkage. NHS Grampian SMR data were held in a dedicated secure server, managed by the accredited Grampian Data Safe Haven (DaSH) 26 . The Community Health Index (CHI) number, a unique patient identifier used throughout the Scottish health care system, was used to link the study population to hospital episode data by DaSH. Postcodes were used to link the study population to the SIMD and Urban Rural Classification. The de-identified dataset was prepared and hosted by the Grampian DaSH, allowing secure controlled access for researchers while ensuring data security.
There were 662 admissions with missing CHI numbers in 2014 (inpatient general/acute, ≥ 18 years), therefore these admissions were not included. There were 314 patients who could not be linked with SIMD, and 576 patients who could not be linked with Urban Rural Classification, because of missing or invalid postcodes (Fig. 1 www.nature.com/scientificreports/ summarised by age, sex, admission type, SIMD quintile and Urban Rural category. The overall prevalence (%) was estimated for each condition and counts of conditions were calculated. Clustering conditions, with each condition belonging exclusively to only one cluster, is a widely used approach to identify multimorbidity clusters. However, the same condition might occur in different combinations with other conditions in different patients. Patients with these different combinations of conditions, even if they share one same condition (e.g. Chronic Kidney Disease (CKD) and Diabetes, CKD and Chronic Heart Failure (CHF), only diabetes, only CKD, only CHF), might need a different plan of care.
An alternative valid clustering approach is to cluster patients instead, according to those combinations of conditions, which allows conditions to belong to more than one cluster of patients. While both methods are valid, we chose to cluster "patients" rather than "conditions", as it better aligns with the purpose of identifying clusters of multimorbidity for improved person-centred care.
Relevant diagnosed chronic conditions in the previous 5 years were used to cluster patients with ≥ 2 conditions. Conditions were coded as binary variables, value of "1" when condition was present and "0" when absent. Prior to performing the cluster analysis, we evaluated whether the data contained non-random structures by visually inspecting the data (principle component analysis scatterplot Additional file 3) and using the Hopkins statistic 27 . These showed that the data was non-random, and therefore clusterable (Hopkins 0.28).
The Gower distance 28 (equivalent to Jaccard 29 when using only binary data) was used to measure the dissimilarity between observations. The Partitioning around Medoids (PAM) algorithm 30 was used to identify distinct groups of patients with similar patterns of conditions, classifying individuals into mutually exclusive groups. The Silhouette method 31 was used as an internal validation metric to determine the optimal number of patient groups, which was the number of groups that yielded the highest silhouette value. The groups were interpreted using descriptive statistics and the dimension reduction technique t-distributed stochastic neighbourhood embedding (t-SNE) was used to visualise the clusters 32 .
We carried out several sensitivity analyses. We compared the results obtained by: 1. replacing Gower with the Hamming distance 33 ; 2. excluding the most common condition (hypertension) from the clustering process; and 3. excluding conditions with a prevalence of < 5% from the clustering process. Analyses were conducted using STATA v13.0 and R version 3.6.1.
Defining clusters of conditions. Prior to analysis, we documented the clusters we would expect to observe. The patterns of conditions present in the resulting groups of patients were clinically reviewed by clinical members of the study group (CB, MJ, SS). Clusters of conditions within each patient group were defined based www.nature.com/scientificreports/ on a combination of clinical review and assessment of the highest prevalence conditions within each patient group and labelled according to the condition with the highest prevalence.

Study registration. This study was prospectively pre-registered on the Open Science Framework on 26
September 2019 (https:// osf. io/ qnpw2). Deviations from the pre-registered protocol are documented in Additional file 4 and analysis R code is available in Additional file 5.

Ethics approval and consent to participate. This study was approved by North Node Privacy Advisory
Committee (NNPAC Ref No. 6/001/19). The remit of this Committee is to provide researchers with access to NHS patient/health data within NHS Grampian for research purposes via a streamlined approach that incorporates Sponsorship, Ethics, Caldicott & R&D. Informed consent was waived by North Node Privacy Advisory committee as this research falls within the conditions for processing personal data that is "necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller; (Article 6 1,e of the UK General Data Protection Regulation (GDPR))". Data was de-identified pre-analysis. All methods were performed in accordance with the relevant guidelines and regulations.

Multimorbidity clusters.
Cluster analysis of disease occurrence identified ten groups of patients. Table 3 describes the prevalence of all conditions in each patient group. Within each patient group, clusters of conditions have been highlighted in bold and labelled according to the most prevalent condition in each group. Other conditions that featured in a patient group (i.e. less common conditions with a higher prevalence than in other groups) are highlighted in italics. For example, Group 1 (labelled "hypertension") was characterised by hypertension (77.5%) and atrial fibrillation (59.0%). Other feature conditions were non-metastatic cancer and chronic heart failure. Group 3 ("alcohol misuse") was characterised by alcohol misuse (75.4%) and depression (54.5%). Other feature conditions in this group were asthma, cirrhosis, diabetes, epilepsy and schizophrenia.
The number of patients in each group ranged from 508 (hypothyroidism) to 2590 (hypertension). Several conditions were present in multiple groups of patients. Seven of the ten groups included hypertension, three included chronic kidney disease, two diabetes, and two atrial fibrillation. Multimorbidity clusters are summarised in Table 4. Table 5 describes the characteristics of patients in each group. Median age ranged from 51 (Group 3 alcohol misuse) to 79 (Group 8 chronic heart failure) years. The groups with the highest proportion of females were Group 6 (chronic pain), and Group 10 (hypothyroidism). The groups with the highest proportion of males were Group 3 (alcohol misuse), and Group 8 (chronic heart failure). The proportion of patients from the most deprived quintile ranged from 6.3% to 14.1%. Group 3 (alcohol misuse) had the highest proportion of patients from the most deprived and large urban areas, while Group 7 (cancer) had the highest proportion of patients from the least deprived and rural areas. Median counts of conditions ranged from 2 to 4, with Group 4 (CKD/ diabetes) and Group 8 (chronic heart failure) having the highest proportion of patients with five or more conditions. The highest proportion of patients admitted as an emergency was in Group 3 (alcohol misuse) and Group 8 (chronic heart failure).

Sensitivity analyses.
Results of the three sensitivity analyses are shown in Additional file 6. The sensitivity analyses using the Hamming distance or excluding hypertension resulted in similar clusters being identified. Excluding conditions with a prevalence of < 5%, resulted in 13 clusters, with some clusters split over more smaller groups compared with the main analysis. For example, the asthma and chronic pulmonary disease cluster was split into two separate clusters. However, overall, the same conditions were identified.

Discussion
To our knowledge, this is the first study to describe multimorbidity clusters in an unselected inpatient adult population, and the first population-level study of multimorbidity clusters in a Scottish/UK hospitalised population. Of 41,545 patients admitted to hospital, approximately one quarter (11,389) had multimorbidity, and our analysis identified ten clusters of co-occurring conditions.
The clusters revealed recognisable co-occurrences where the link was potentially causal e.g. hypertension leading to atrial fibrillation 34 , diabetes leading to kidney disease 35 . Clusters also revealed shared underlying disease mechanisms. For example, chronic heart failure, myocardial infarction, atrial fibrillation, stroke and kidney disease as vascular conditions of older age 36 . We identified a group of patients with a high prevalence of alcohol misuse co-occurring with depression and asthma, predominantly male and from more deprived quintiles, possibly indicating an underlying social driver. This finding supports known inequalities in alcohol-attributable harms, given that disadvantaged social groups have greater alcohol-attributable harms (admissions or death) www.nature.com/scientificreports/ compared with more advantaged individuals 37 . There were also clusters that represented artefact of how conditions are classified e.g. metastatic disease with non-metastatic cancer as two conditions in one person. Conditions with a high prevalence also had an impact, for example hypertension was present in more than half of those with multimorbidity and was a key condition in seven out of ten clusters. While these clusters have face validity, their usefulness depends upon how they might delineate groups of people with specific health and social needs. Relevantly, those in the chronic heart failure cluster were the oldest (median age 79), and more likely to have 5 + health conditions; whereas those in the alcohol misuse cluster were more likely to be of working age (median age 51), live in a deprived area and present to hospital as an emergency. Thus, notwithstanding the artefact of associations between very common conditions, e.g. hypertension, and those that are prerequisites of another, e.g. metastatic cancer, we have shown the potential of identifying Table 1. Baseline characteristics and counts of conditions. IQR, inter-quartile range; SIMD, Scottish Index of Multiple Deprivation. a 314 patients had missing values for SIMD category (< 2 n = 279, ≥ 2 n = 35) and 576 patients had missing values for Urban Rural category (< 2 n = 488, ≥ 2 n = 88). b Rows reporting number of patients with 10 and 11 conditions have been collapsed due to counts < 5.  www.nature.com/scientificreports/ key multimorbidity clusters to which people may belong, so that we can ensure that health and social support is prioritised to the inpatient areas. Methodological heterogeneity in studies investigating multimorbidity clusters makes it difficult to make comparisons. Studies vary with regard to the number and type of conditions included, data sources, populations, settings, and clustering methods. The most comparable study, in adult medical inpatients, identified five clusters of conditions: neurological diseases, heart/kidney diseases, malignancy, psychiatric diseases and miscellaneous diseases, from a list of 17 conditions 9 . We also identified similar chronic heart failure and cancer clusters.

< 2 Conditions ≥ 2 Conditions n (%) n (%)
This was a large, population-based study, and to our knowledge, the first study to characterise patterns of multimorbidity in an unselected hospitalised population. We ascertained conditions over the 5 years prior to index date, as longer lookback periods are more effective for identifying conditions 38,39 . We used high quality administrative data, with quality assurance assessments undertaken to ensure that inpatient data items were being recorded consistently and to a high standard 40 . Our results should be generalisable to other hospitalised populations with similar characteristics, and furthermore, the methodology used in this study would be applicable to health systems worldwide.
Limitations should also be noted. Cluster analysis is an exploratory classification method, and different clustering algorithms may produce different results. We found that hierarchical cluster analysis did not produce clinically relevant clusters, and therefore have reported results from PAM. To help with this, sensitivity analyses were conducted, the final clustering solution was clinically reviewed, and we have transparently and comprehensively reported our methods and deviations from pre-registered protocol (Additional file 4). Another limitation was that as conditions were identified from hospital episode data in the 5 years prior to index admission, we will not have recorded conditions for patients who were first time presenters on the index date. We did not include conditions from primary care records which will have underestimated the multimorbidity burden among people with conditions predominantly looked after in primary care. However, it is reasonable to hypothesise that conditions which are rare in the hospital setting, would have less influence on health care needs of people in hospital. Finally, multimorbidity clustering was based specifically on the conditions in Tonelli's measure of multimorbidity 19 . There are many other heterogenous measures of multimorbidity available and we acknowledge that our findings may www.nature.com/scientificreports/ www.nature.com/scientificreports/ change if other conditions are studied. However, there is no single recommended measure of multimorbidity available. Therefore, we selected Tonelli as it is a validated adaptation of the highly influential Barnett measure. The value of identifying clusters of conditions in hospitalised patients is as a first step towards identifying opportunities to target patient-centred care towards people with unmet needs. An important next step will be to determine the clinical outcomes of patients in each cluster, reasons why patients within some clusters have poor outcomes, and the pathways through healthcare that patients in each cluster predominantly take.

Conclusions
Identifying clusters of conditions in hospital patients is a first step towards identifying opportunities to target patient-centred care towards people with unmet needs, leading to improved outcomes and increased efficiency.
Here we have demonstrated the face validity of cluster analysis as an exploratory method for identifying clusters of conditions in hospitalised patients with multimorbidity.

Data availability
The data that support the findings of this study are available in the Grampian Data Safe Haven [Dash140/ DaSH326], provided the necessary permissions have been obtained. Further information is available at http:// www. abdn. ac. uk/ iahs/ facil ities/ gramp ian-data-safe-haven. php and requests for data may be made to Professor Corri Black on behalf of Grampian Data Safe Haven, corri.black@abdn.ac.uk. www.nature.com/scientificreports/